Stylometric Analyses using Dirichlet Process Mixture Models

نویسندگان

  • Paramjit S. Gill
  • I. K. Barber
  • Tim B. Swartz
چکیده

Stylometry refers to the statistical analysis of literary style of authors based on the characteristics of expression in their writings. We propose an approach to stylometry based on a Bayesian Dirichlet process mixture model using multinomial word frequency data. The parameters of the multinomial distribution of word frequency data are the “word prints” of the author. Our approach is based on model-based clustering of the vectors of probability values of the multinomial distribution. The resultant clusters identify different writing styles that assist in author attribution for disputed works in a corpus. As a test case, the methodology is applied to the problem of authorship attribution involving the Federalist papers. Our results are consistent with previous stylometric analyses of these papers.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

BAYESIAN ANALYSIS OF A RANDOM LINK FUNCTION IN BINARY RESPONSE REGRESSION by

Binary response regression is a useful technique for analyzing categorical data. Popular binary models use special link functions such as the logit or the probit link. We assume that the inverse link function H is a random member of the class of normal scale mixture cdfs. We propose three di erent models for this random H : (i) H is a nite scale mixture with a Dirichlet distribution prior on th...

متن کامل

Dirichlet Process

The Dirichlet process is a prior used in nonparametric Bayesian models of data, particularly in Dirichlet process mixture models (also known as infinite mixture models). It is a distribution over distributions, i.e. each draw from a Dirichlet process is itself a distribution. It is called a Dirichlet process because it has Dirichlet distributed finite dimensional marginal distributions, just as...

متن کامل

Hierarchical Double Dirichlet Process Mixture of Gaussian Processes

We consider an infinite mixture model of Gaussian processes that share mixture components between nonlocal clusters in data. Meeds and Osindero (2006) use a single Dirichlet process prior to specify a mixture of Gaussian processes using an infinite number of experts. In this paper, we extend this approach to allow for experts to be shared non-locally across the input domain. This is accomplishe...

متن کامل

Variational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures

We study a Bayesian framework for density modeling with mixture of exponential family distributions. Our contributions: •A variational Bayesian solution for finite mixture models • Show that finite mixture models (with a Bayesian setting) can determine the mixture number automatically • Justify this result with connections to Dirichlet Process mixture models •A fast variational Bayesian solutio...

متن کامل

Time-sensitive Dirichlet process mixture models

We introduce Time-Sensitive Dirichlet Process Mixture models for clustering. The models allow infinite mixture components just like standard Dirichlet process mixture models. However they also have the ability to model time correlations between instances. Research supported in part by NSF grants NSF-CCR 0122481, NSF-IIS 0312814, and NSFIIS 0427206. Zoubin Ghahramani was supported at CMU by DARP...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011